Power Programmierung

home *** CD-ROM | disk | FTP | other *** search

/ Power Programmierung / Power-Programmierung CD 2 (Tewi)(1994).iso / doc / mir / marc_rec < prev next >

Wrap

Text File | 1992-05-23 | 6KB | 182 lines

MARC FORMAT RECORDS =================== Prepared by Doug Lowry June 13, 1986 OBJECTIVE: ========= To examine the structure of MARC format records with a view to writing a preprocessor which will create standard format records from MARC records. LIMITATIONS: =========== What follows is not an exhaustive study. Harvey Martens has done a preliminary analysis. I have carried it a few steps further, aided by discussion with Michael xxxxxx of the National Research Council (613 xxx-xxxx) on June 12. BASIC RECORD STRUCTURE: ====================== MARC records occur in blocks. Each block is preceded by a 4 byte value; the first two bytes are the high and low order bytes respectively of the length of the block. The next two bytes are each null. For example, octal values 026 270 000 000 indicate a block length of 5816 bytes. (It is not yet clear whether this count includes or excludes the four bytes for the block length.) An individual MARC record consists of these components: 1. A 4 byte record length indicator 2. A 24 byte leader 3. A record directory or entries map 4. Control fields and variable fields 5. A group separator character RECORD LENGTH INDICATOR: ======================= Each MARC record is preceded by 4 bytes... the high order byte and the low order byte respectively of the record length in bytes, then two null bytes. For example, octal 001 366 000 000 indicate a record length of 502 bytes. This count includes the four byte indicator. RECORD LEADER: ============= 24 bytes as follows: 1...5 ASCII record length in bytes, EXCLUDING the four byte record length indicator above. 6 Record status letter (N= new, C= correction, D= deletion, ...) 7 Type (codes not currently known) 8 Bibliographic category (A= analytic, M= monograph, S= serial, ...) 11 Indicator count (uncertain... not immediately relevant) 13...17 Seems to be an ASCII count of the number of bytes in the following directory entries map. 18...24 Uncertain RECORD DIRECTORY: ================ A record directory consists of a series of ASCII numeric values: 3 byte field number 4 byte inclusive length of field in bytes 5 byte offset in bytes from beginning of field data The field numbers in the examples examined so far appear in numeric order within the directory. A field number may occur more than once. The location of the data appears in near random order (possibly the order in which fields were added). Note the offsets in the following real example: Field Length Offset 008 0039 00000 009 0032 00284 022 0025 00134 035 0030 00104 088 0007 00229 089 0036 00236 090 0012 00272 100 0014 00159 245 0041 00063 260 0008 00039 260 0008 00047 300 0008 00055 410 0056 00173 RS A close examination of the directory shows that it is arithmetically coherent. For example, at offset 00000 above, there is something 39 bytes long. Sure enough, the next lowest offset is 00039. There are 8 bytes in that field, and the next offset is 00047, etc. A directory is terminated by an "RS" or record separator byte (octal 036). FIELD CONTENTS: ============== Fields are of two types... control fields, numbered 001...009 variable fields, numbered 010...999 Control fields are fixed format and specialized. For now, control fields other than 009 can be ignored. Field 008 is usually present, but its contents are duplicated in 009. We will treat 009 as if it were a variable field, except that non-printing characters should be replaced by white space. Variable fields are essentially free text. Sub fields may exist within a field. For now, we can indicate new sub fields by replacing the separators by a newline symbol in the compressed text. All fields (except the first) and all sub fields begin with a "US" (unit separator, octal 037) followed by a single lower case character. The significance of different characters is not yet known. We do know that it is safe to collapse out the "US" byte and the following byte as white space for preprocessing, and to replace them by a newline for creating compressed text. All fields end with an "RS" 036 record separator. END OF RECORD: ============= Records are terminated by a single "GS" byte (group separator, octal 035). ADDITIONAL NOTES: ================ Consistent order within records would be reasonably assured if we extract data in field number order per the directory rather than by actual occurrence within the record. For preprocessing, this would mean that at least the full record would be needed in RAM before preprocessing it. The list of variable field names may be extracted from "Composite MARC Format", a tabular listing which has been ordered. Individual organizations may utilize unnamed fields; until they provide the names, a name such as "Field 089" could be used.